Using Temporal Neighborhoods to Adapt Function Approximators in Reinforcement Learning
نویسندگان
چکیده
To avoid the curse of dimensionality, function approximators are used in reinforcement learning to learn value functions for individual states. In order to make better use of computational resources (basis functions) many researchers are investigating ways to adapt the basis functions during the learning process so that they better t the value-function landscape. Here we introduce temporal neighborhoods as small groups of states that experience frequent intragroup transitions during on-line sampling. We then form basis functions along these temporal neighborhoods. Empirical evidence is provided which demonstrates the e ectiveness of this scheme. We discuss a class of RL problems for which this method might be plausible.
منابع مشابه
A NEAT Way for Evolving Echo State Networks
The Reinforcement Learning (RL) paradigm is an appropriate formulation for agent, goal-directed, sequential decision making. In order though for RL methods to perform well in difficult, complex, real-world tasks, the choice and the architecture of an appropriate function approximator is of crucial importance. This work presents a method of automatically discovering such function approximators, ...
متن کاملEvolutionary Function Approximation for Reinforcement Learning
Temporal difference methods are theoretically grounded and empirically effective methods for addressing reinforcement learning problems. In most real-world reinforcement learning tasks, TD methods require a function approximator to represent the value function. However, using function approximators requires manually making crucial representational decisions. This paper investigates evolutionary...
متن کاملTransfer Learning via Inter-Task Mappings for Temporal Difference Learning
Temporal difference (TD) learning (Sutton and Barto, 1998) has become a popular reinforcement learning technique in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but the most basic algorithms have often been found slow in practice. Th...
متن کاملHigh-accuracy value-function approximation with neural networks applied to the acrobot
Several reinforcement-learning techniques have already been applied to the Acrobot control problem, using linear function approximators to estimate the value function. In this paper, we present experimental results obtained by using a feedforward neural network instead. The learning algorithm used was model-based continuous TD(λ). It generated an efficient controller, producing a high-accuracy ...
متن کاملEmpirical Comparison of Gradient Descent andExponentiated Gradient Descent in
This report describes a series of results using the exponentiated gradient descent (EG) method recently proposed by Kivinen and Warmuth. Prior work is extended by comparing speed of learning on a nonstationary problem and on an extension to backpropagation networks. Most signi cantly, we present an extension of the EG method to temporal-di erence and reinforcement learning. This extension is co...
متن کامل